A NOTE ON TALAGRAND'S CONVEX HULL 
CONCENTRATION INEQUALITY 



DAVID POLLARD 



Ab STRACT. The paper reexamines an argument by Talagrand 
that leads to a remarkable exponential tail bound for the con- 
centration of probability near a set. The main novelty is the 
replacement of a mysterious Calculus inequality by an appli- 
cation of Jensen's inequality. 



1. Introduction 

Let A" be a set equipped with a sigma-field A. For each vector w = 
{wi, . . . , Wn) in M" , the weighted Hamming distance between two vectors 
X = (xi, . . . , Xn) and y = (yi, . . . , ?/„), in A"" is defined as 



1 if Xi 7^ yi 
otherwise. 



d^{x, := ^ Wihi{x, y) where hi{x, y) 

i<n 

For a subset A of X"^ and x E X^, the distances dyj{x^ A) and D{x, A) are 
defined by 

dyj{x) := m.i{y G A : dyj{x,y)} 

and 

D{x,A) := sup^g^rf^(X,v4), 
where the supremum is taken over all weights in the set 

W := {{wi, . . . , Wn) : Wi>0 for each i and |u'|^ := 7 w'^ < 1}. 



Talagrand (1995, Section 4. 1) proved a remarkable concentration inequal- 



ity for random elements X = (Xi, . . . , X„) of X" with independent coor- 
dinates and subsets A E A^: 

(1) ¥{X E A}¥{D{X, A)>t} < exp(-tV4) for all t > 0. 
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As Talagrand showed, this inequality has many applications to problems in 
combinatorial optimization and other other areas. See Talagrand (1996b)| 
Steele (1997, Chapter 6)] and [McDiarmid (1998, Section 4)| for further ex- 
amples. 

Talagrand used an induction on n to establish his result, invoking a slightly 
mysterious Calculus lemma in the inductive step. There has been a strong 
push in the literature to establish concentration and deviation inequalities 
by "more intuitive" methods, such as those based on the tensorization, as in 



Ledoux (1 996)1 [Boucheron, Lugosi, and Massart (2000)HMassart (2003)] and 



Lugosi (2003) 



It is my purpose in this note to modify Talagrand 's proof — adapting an 



idea from Talagrand (1996a, Section 3) — so that the inductive step becomes 
a simple application of the Holder inequality (essentially as in the original 
proof) and the Jensen inequality. 

The distance D{x, A) has another representation, as a minimization over 
a convex subset of [0, 1]. Write h{x,y) for the point of {0, 1}" with ith coor- 
dinate hi{x, y). For each fixed x, the function h{x, ■) maps A onto a subset 
h{x,A) := {h{x,y) : y G A} of {0, 1}". The convex hull co (/;,(a;, A)) 
of h{x, A) in [0, 1]*^ is compact, and 

D{x, A) = inf{|e| : e e CO {h{x, A))}. 

Each point ^ of co {h(x, A)) can be written as J h{x, y) v{dy) for a in the 
set V{A) of all Borel probability measures for which ^{A) = 1. That is, 
= u{y e A : yi Xi}. Thus 



(2) 



D{x,Af= inf V.^ {u{yeA:y,y^x,}Y. 



Talagrand actually proved inequality ([T]) by showing that 

(3) P{X e AjPexp Af) < 1. 

He also established an even stronger result, in which the -D(X, A)^/4 in (|3]) 
is replaced by a more complicated distance function. 

For each convex, increasing function ^ with ^(0) = = ^'(0) define 

(4) F^[x,A):= -Yni i){y{y ^ A:yii^Xi}) , 



For each c > 0, [Talagrand (1995, Section 4.2)| showed that 



(5) 



(P{XG A})^Pexp {F^^iX,A))<l, 
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where 
(6) 

iPc{e) := {{I - 6) iog(i -e)-{i-e + c) log 



'i-e) + c 



> 



l + c 

^^>-^ ¥ i ^31^ ) ^= 7T-1 



2 + 2c 

As you will see in Section [3l this strange function is actually the largest 
solution to a differential inequality, 

-o)< i/{e^ + ec) for < < 1. 

Inequality ^ improves on ([3]) because D{x, A^/A < F^^ {x, A). 

Following the lead of Talagrand (1995, Section 4.4) , we can ask for gen- 
eral conditions on the convex tp under which an analog of ^ holds with 
some other decreasing function of F{X E A} as an upper bound. The fol- 
lowing modification of Talagrand's theorems gives a sufficient condition in 
a form that serves to emphasize the role played by Jensen's inequlity. 

Theorem 1. Suppose 7 is a decreasing function with 7(0) = 00 and ip 
is a convex function. Define G{ri,6) := ip{l — 9) + Or] and Girf) : = 
'm.iQ<o<iG{rj,6) for Tj G M^. Suppose 

(i) r 1-^ exp (G^(7(r) — 7(^0))) is concave on [0, ro],/or each tq <1 

(ii) (1 - j9)e'^(i) + p< e'^^P^for < p < 1. 

Then 

Pexp {F^X, A)) < exp (7 (P{X E A})) . 

for every A E A"' and every random element X of X"' with independent 
components. 

The following lemma, a more general version of which is proved in Sec- 
tion 3Proof of the Concavity Lemmasection.3, leads to a simple sufficient 
condition for the concavity assumption (ii) of Theorem[T]to hold. 

Lemma 2 (Concavity lemma). Suppose ip : [0, 1] M.'^ is convex and 
increasing, with ip{0) = = ijj'iO) and ip"{6) > for < 9 < 1. 
Suppose C, : [0,ro] U {c>o} is continuous and twice differentiable 

on (0, ro). Suppose also that there exists some finite constant c for which 
^"(r) < cE,'{r)'^ for < r < tq. If 

^"(1 -^) < l/(^^ + ^c) forO<0<l 

then the function r t-^ exp (G(^(r))) is concave on [0, tq]. 
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The Lemma will be applied with ^(r) = 7(r) — 7(ro) for < r < vq. As 



shown in Section 3Proof of the Concavity Lemmasection.3 the conditions 
of the Lemma hold for ip{6) = 6'^ /A with 7(r) = log(l/r) and also for the 
from ^ with 7(r) = clog(l/r). 
Remarks. 

(i) If 7(0) were finite, the inequality asserted by Theorem [T] 
could not hold for all nonempty A and all X. For exam- 
ple, if each Xi had a nonatomic distribution and A were 
a singleton set we would have F^{X,A) = n^/'(l) al- 
most surely The quantity Pexp ^4)) would exceed 
exp(7(0)) for large enough n. It it to avoid this difficulty 
that we need 7(0) = 00. 

(ii) Assumption (ii) of the Theorem, which is essentially an 
assumption that the asserted inequality holds for n = 1, 
is easy to check if 7 is a convex function with 7(1) > 0. 
For then the function B{p) := exp(7(p)) is convex with 
B{1) > 1 and B'{1) = 7'(l)e^(^). We have 

-B(p) > (1 -p)e'^(^) +p for all p in [0, 1] 

if < 1 - e'^(i). 

(iii) I had hoped to extend the proof to cover the case c = but 
I then ran into problems with 7(0) = 00. 

2. Proof of Theorem [H 

Argue by induction on n. As a way of keeping the notation straight, re- 
place the subscript on F^{x, B) by an n when the argument 5 is a subset 
of A"". Also, work with the product measure Q = ®i<nQi for the distribu- 
tion of X and Q_„ = ®i<nQi for the distribution of (Xi, . . . , X„_i). The 
assertion of the Theorem then becomes 

Qexp [Fn{x,A)) < exp(7(QA)) 

Forn = 1 and 5 G ^ we have Fi(a;, 5) = ip{l){x ^ B}-\-0{x G B} so 
that Qi exp {Fi{x, B)) < (1 —p)e'^^^^ +p, where p = QiB. Assumption (i) 
then gives the desired exp(7(p)) bound. 

Now suppose that n > 1 and that the inductive hypothesis is valid for 
dimensions strictly smaller than n. Write Q as Q_„ ® Qn- To simplify 
notation, write w for x_„ := (xi, . . . , and z for x„. Define the cross 
section A^ := {w G X'^^^ : (w, z) G A] and write Rz for Q_„y4^. Define 
ro := sup^g_:^. Rz- Notice that tq > QnRz = Q^- 

The key to the proof is a recursive bound for F„: for each x = {w, z) 
with Az ^ 0, each m with A^rn 7^ 0' and all 6* G [0, 1], 
(7) 

Fnix, A) < eFn-i{w, Az) + eFn-i{w, Am) + ^(^) whcrc e:=l-e. 
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To establish inequality (|7]), suppose fi^ is a probability measure concen- 
trated on Az and /i^ is a probability measure concentrated on Am- For a 6* 
in [0, 1], define u = Ouz^Sz+Onm^^m, a probability measure concentrated 
on the subset {A^ x {z}) U {Am x {m}) of A. Notice that, for i < n, 

u{y e A : yi =^ Xi} = 9fiz{w E A^ : yi ^ Xi} + 9fXm{w E Am : yi ^ Xi} 



By the definition of F„ and the convexity of ijj, 

Fn{x,A) <E^<n^{Hy^^X^}) 

The two sums over the first n — \ coordinates are like those that appear in 
the definitions of F„_i(w, A^) and F„_i(w, A.^. Indeed, taking an infimum 
over all /i^ G V{A.^ and ^m E V{Am) we get the expression on the right- 
hand side of (|7]). 

Take exponentials of both sides of ^ then integrate out with respect 
to Q_„ over the w component. For < < 1 invoke the Holder inqual- 

ity, Q-nU'V' < {Q-nUY {Q-nV)', with U = exp{Fn-i{w,Az)) and 
V = exp(Fn„i(w, Am)), for a fixed m. For each z with 7^ we get 

Q-nexp{Fn{{w,z),A)) 

< (Q-nexp (F„_i(w, A,)))' (Q-„exp 

The inequality also hold in the extreme cases where 9 = or = 1, by 
continuity. The inductive hypothesis bounds the last product by 

exp {9-f{Rz) + 9-f{Rm) + 4^(9)) = exp {-f{Rm) + G{^{Rz) - i{Rm), 9)) 
The exponent is a decreasing function of Rm- Take an infimum over m, to 
replace 'y{Rm) by 7('"o)- Then take an infimum over 9 to get 



and 




< 9. 



(9) 



Q_„exp {Fr,{{w,z),A)) < exp (7(ro) + ^(^7?,))) 
where ^(r) := 'y{Rz) — tI'^o) for < r < tq. 
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If the crossection is empty, the set V{Az) is empty. The argument 
leading from ([7]) to Q still works if we fix 9 equal to zero throughout, 
giving the bound 

Q'^„exp(F„(x,A)) < exp (7(ro) + V-ll)) ifA, = 0. 

Thus the inequality ^ also holds with i?^ = when = 0, because 
^(0) = 7(0) - 7(ro) = oo and G{oo) = tpil). 

By Assumption (i), the function r ^-^ exp (^(^(r))) is concave on [0, tq]. 
Integrate both sides of Q with respect to Qn to average out over the z 
variable. Then invoke Jensen's inequality and the fact that QnRz = QA, to 
deduce that 

Qexp {Fn{x,A)) < exp (7(ro) + G {j{QA) - 7(ro))) . 

Finally, use the inequality G{ri) < rj to bound the last expression by exp(7(Qy4)), 

thereby completing the inductive step. 

Remark. Note that it is important to integrate with respect 
to Qn before using the bound on G: the upper bound exp(— 7(i?2)) 
is a convex function of Rz, not concave. 

3. Proof of the Concavity Lemma 

I will establish a more detailed set of results than asserted by Lemma |2l 
Invoke the monotonicity and continuity of ^' to define (7(77) as the solution 
to i)' (1 - g{r])) =r]ifO<r] < V^'(l) and g{r]) = if V^'(l) < r/. Then the 
following assertions are true. 

(i) 

ip {l — giv)) + VQiv) for < ?7 < 

for:^'{l)<ri 

(ii) G is increasing and concave, with a continuous, decreasing first 
derivative g. In particular, G{0) = and G"(0) = g{0) = 1. 

(iii) G"{r]) = g\Ti) = - [V (1 - g{7i))] for < < ^'(1). 

(iv) Giji) < for all 77 G M+. 

(v) Suppose ^ : J ^ is a convex function defined on a subinter- 
val J of the real line, with ^' 7^ on the interior of J. Suppose 

for all r in the interior of J for which := E (0, 1). Then 
r 1-^ exp {G{^{r))) is a concave function on J. 

Proof of (i) through (iv). The fact that G is concave and increasing follows 
from its definition as an infimum of increasing linear functions of i]. (It 
would also follow from the fact that G'{ri) = g{r]), which is nonnegative 
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and decreasing.) Replacement of the infimum over < 6* < 1 by the value 
at 6* = 1 gives the inequality G{r]) < rj. 

If ?7 > ip'il), the derivative —^'(1 — +?7 is nonnegative on [0, 1], which 
ensures that the infimum is achieved at 6* = 1. 

If < T] < V^'(l), the infimum is achieved at the zero of the derivative, 
9 = g{r]). Differentiation of the defining equality ^' (l — g{r])) = rj then 
gives the expression for g'{r]). Similarly 

G'{ti) = (l - g{r])) g'{ri) + r]g'{r]) + g{ri) = g{r]). 

The infimum that defines G{0) is achieved at g(0) = 1, which gives 
G{0) = '0(0) = 0. Continuity of g at then gives G"(0) = ^(0) = 1. 
Proof of (v). Note that the function L(r) := exp (G(^(r))) is continuous 
on J and takes the value e"^^^^ for all r at which ^(r) > ^p'{l). The second 
derivative L"{r) exists except possibly at points r for which ^(r) = ^/''(l). 
In particular, L"{r) = when ^(r) > ip'{l) and 

L'\r) = {g'i^rmr + 9i^r)C + gi^rnC?) L{r) for < < V^'(l). 
From (iii) and the positivity of L, the last expression is < if and only if 



(e 



/\2 



+ 9{ir)C + 9{irY{Cr<^ 



Divide through by (^^)^ then rearrange to get the asserted inequality for if)" . 
Lemma [2] follows as a special case of (i) through (iv). 

Special cases. If sup^ ^"(r)/^'(r)^ < c, with c a positive constant, the 
inequality from part (v) will certainly hold if 

(10) -9)< {9^ + c9Y^ for all < < 1. 

This differential inequality can be solved, subject to the constraints = 
7/'(0) = ^"'(0), by two integrations. Then 

ni-6) ^ [ni-i)it <[^- ^- (->og«+.og (^)) 

and, with ipc defined by Q, 

^{1-9) = i)\l-t)dt<c-^ -logt+log (^^^^ dt = i)c{l-9). 

Note that ^/'c(l — ^) is the solution to the differential equation 

<(1 - ^) = all < < 1, with 7/',(0) = iP'M = 0- 

+ c9 



It is the largest solution to ([TO] 
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